Convolutional Neural Network

Extract, composite, and match simple shapes.

Model

$$ \begin{aligned} I_n &\to \boxed{\phi_1, \dots, \phi_K} \to M_n = f(I_n; \phi_1, \dots, \phi_K) \\ &\to \boxed{\psi_1, \dots, \psi_K} \to L_n = f(M_n; \psi_1, \dots, \psi_K) \\ &\to \boxed{\omega_1, \dots, \omega_K} \to G_n = f(L_n; \omega_1, \dots, \omega_K) \\ &\to \boxed{W} \to p_n = g(G_n; W) \\ \end{aligned} $$

Risk Function

$$ \begin{aligned} \mathcal{L}(\Phi, \Psi, \Omega, W) &= \frac{1}{N} \sum_{n=1}^{N} \ell(y_n, p_n) \end{aligned} $$

Parameters

$$ \begin{aligned} (W, H, C_{\text{in}}, C_{\text{out}}) \end{aligned} $$

$W, H$: Input Image Width & Height
$ C_{\text{in}}, C_{\text{out}} $: Input Channel Size, Output Channel Size
Kernel Size: $ k $ ; Padding Size: $ p $ ; Stide: $ s $

Output Size

"A guide to convolution arithmetic for deep learning"

(The "$ 1 $" is for the starting one. $ \lfloor \rfloor $ is used, since the one strided out doesn't count.) $$ \begin{aligned} o &= 1 + \left\lfloor \frac{i + 2p - k}{s} \right\rfloor &\text{[for convolution]} \\ &= 1 + \left\lfloor \frac{i - k}{s} \right\rfloor &\text{[for pooling]} \end{aligned} $$

$ o $ : Output Size
$ i $ : Input Size
$ k $ : Kernel Size
$ s $ : Stride Size
$ p $ : Padding Size

$$ \begin{aligned} p &= \left\lfloor \frac{k}{2} \right\rfloor &\text{[half padding]} \\ &= k - 1 &\text{[full padding]} \end{aligned} $$